1. Field of the Invention
The invention relates to a topic identification system for email collections and an indexing system for email subject lines
2. Description of Related Art
A fairly new type of document collection is a collection of stored email messages. Such a collection may consist of some messages received by one or more individuals who explicitly store the messages in the collection, in which case the collection is often referred to as a “folder”. Such a collection may alternatively comprise of all the messages sent to a “discussion list”, in which case a manual or automated “list manager” may store the messages and forward them to all the subscribers to the list.
A discussion list may be public or private, and the subscribers may represent members of an administrative unit, a working group, or just a collection of people interested in the subject area covered by the list. We will refer herein below to either type of collection of stored email messages as an archive. The messages placed in an email archive, especially an archive associated with a discussion list, are generally not isolated documents, but form parts of conversations, called “threads”. These thread groupings, which may be anywhere from two to several hundred messages in length, conventionally consist of messages having the same subject line in their standard headers. That is, headers formed according to Internet Standard RFC 822, for example, and any standard replacing the latter. For many discussion lists, the subject lines of all but the first message in a thread are usually prefixed by “Re:”, and the headers also contain references to earlier messages in the threads.
Email archives, especially archives of discussion lists, are read for many purposes. For example, an engineer can gain a better understanding of the rationale behind a product feature even after the original design engineers are no longer available. An archive may also be read to enable a user to become familiar with the workings of a workgroup and its issues and concerns, research the general subject of a mailing list, or sample public opinion.